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ABSTRACT 

Recent investigations reveal an important new class of transient radio phenomena that occur on sub- 
millisecond timescales. Often transient surveys' data volumes are too large to archive exhaustively. 
Instead, an on-line automatic system must excise impulsive interference and detect candidate events in 
real-time. This work presents a case study using data from multiple geographically distributed stations 
to perform simultaneous interference excision and transient detection. We present several algorithms 
that incorporate dedispersed data from multiple sites, and report experiments with a commensal real- 
time transient detection system on the Very Long Baseline Array (VLB A). We test the system using 
observations of pulsar B0329-f 54. The multiple-station algorithms enhanced sensitivity for detection 
of individual pulses. These strategies could improve detection performance for a future generation of 
geographically distributed arrays such as the Australian Square Kilometre Array Pathfinder and the 
Square Kilometre Array. 

Subject headings: methods: observational — pulsars: general — radio continuum: general 



1. INTRODUCTION 

The radio sky va ries over a wide range of timescales 
(jCordes et al.ll200^ . Recent studies have characterized 
populations of slow transient radio sources that vary 
over timesc ales fro m seconds to years (iBower "eral|[2007l: 
iCroft et al . 2010; B annister et"al1l2010[ r Observers have 
also discovered an important new class of fast tran- 
sient events at mil li second- to sub- milli second timescales 
(jLazio et all 120091 iCordes et all I2004D. These include 
Gamma Ray Bursts (fCamero n et al. I2005l). Rotating Ra- 
dio Transients fRRATs) (|McLaughl In et al.l |2006[) . and 
unique sin gle-pulse phenomena like the Lorimer Burst 
(jLorimer e t al. 2007). Fast transients' short durations 
imply high-energy coherent processes, giving them sig- 
nificant scientific importance. However, few surveys have 
specifically targeted fast transients, and with few vali- 
dated detections these populations are poorly character- 
ized. The challenge has motivated considerable interest 
from the radio astronomy and pulsar communities, with 
several recent and forthcoming searches for transient sig- 
nals in array time series data. Such surveys include 
the A TA Fly's Eye (|von Korff e t al.l l2009t iSiemioii et all 
I2010D . the LWA transients studv dTavlor et all | 2066E 
the LOFAR transient campaign fHessels et al.' 1 2009 ). 
and the ALFA pulsar search (Dcncva et al. 2009^^ In 
the near future, a new generation of instruments 
with significantly improved survey speed and sensitiv- 
ity will begin operations. The Square Kilome tre array 
(|Cordes fc McLauehlinl [200l IHall et al.l [2001 and its 
precursor projects such as the Aus tralian Square Kilome - 
tre Array Pathfinder (ASKAP) () Johnston et al.ll2008n . 
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the M urchison Widefie ld Array (M WA) (|Lonsdale et al.l 
I2009f) . and MeerKAT () Jonas! I2009f) will open additional 
observational parameter space to new and unanticipated 
transient sources. 

Any transient survey must demonstrate that detected 
events are not of terrestrial origin. Frequency disper- 
sion is strong evidence, but perhaps not conclusive proof 
since terrestrial events m ay mimic dispersion profiles 
(jBurke-Spolaor et al.ll2010l ). Localizing a source within 
a calibrated image would provide more conclusive ev- 
idence, as would coherent dedispersion to resolve its 
temporal structure. However these analyses require ac- 
cess to raw antenna voltages and infeasible data stor- 
age volumes. Therefore investigators often buffer time 
series voltage data just long enough to quickly identify 
probable transients, and save the only the most promis- 
ing data to archival stor age for a full coherent analy- 
sis (jMacauart et al.|[2010D . Accurate candidate selection 
is essential because of limits to storage space and the 
time of human analysts to examine detections off-line 
(jEUingson' '2004"). Transient searches must address the 
algorithmic challenge of real-time event detection in in- 
complete, noisy data. 

Transient searches in time series data are uniquely sen- 
sitive to impulsive disruptions from instrumental gain 
variations and Radio Frequency Interference (RFI). Such 
phenomena have less effect on imaging studies since they 
generally disappear during correlation. However, impul- 
sive noise is similar in character to single pulse transients 
making it a significant practical challenge to real-time 
detection. Investigators generally treat interference ex- 
cision and source detection separately; they first remove 
contaminated segments and then detect candidate signals 
in the remainder. Typical excision algorithms use indica- 
tors like atypical spectr al kurtosis (DcUcr 2010), the lack 
of frequency dispersion (|Deneva et al.,, 2009.) , or the nar- 
row b andwidth typical of artificial sources (|Bhat et al.l 
l2005f ). Median filtering is often used to mitigate im- 
pulsive noise. In practice most arrays also use ad hoc 
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rules for site-specific interference. However, it is al- 
ways difficult to excise interference entirely, and tran- 
sient sources are so rare that occasional terrestrial sig- 
nals easily dominate the effective sensitivity achievable 
for a given archiving budget. Very Long Baseline con- 
figurations with distributed stations further increase ex- 
posure to hardware faults and RFI. Future installations 
like the Square Kilometre Array will have dramatically 
larger scale and complexity but the number of human 
analysts for manual post analysis will remain relatively 
constant, making interference mitigation a vital enabling 
technology (Ellingson 2004) . More generally, a principled 
approach to disambiguate interference will be important 
for validating any positive detections. 

This work exploits geographic separation to adaptively 
and jointly classify interference, background noise, and 
novel transient signals. In general, interference is statis- 
tically independent at widely-separated stations. Detec- 
tors can exploit this principle to discriminate terrestrial 
events in real-time without computationally expensive 
coherent analysis. Geographic separation enables unam- 
biguous classification of non-terrestrial sources, making 
very long baseline configurations especially valuable for 
fast transient surveys. Most previous studies of the tran- 
sient detecti on problem tre at the single-dish or single- 
station case (|Fridmanll2010D . At least one other investi- 
gation has u sed dual station detection for RFI excision 
(jBhat et al.ll2005t ). Bhat et al. use two stations' indepen- 
dent detections, comparing the final event lists to create 
an RFI excision mask. Our work explores the most gen- 
eral formulation of online joint RFI excision and source 
detection incorporating the detected signal strength at 
all stations simultaneously, for observations collected at 
many distributed locations. 

This work focuses on incoherent transient detection, 
where received signals are channelized in a spectrome- 
ter and squared, distinct from the more computation- 
ally expensive coherent approaches using phase informa- 
tion. Figure [T] shows the basic components of a multiple- 
station incoherent transient detection system. Here a set 
of n stations observes some common source, and stores 
the complete raw voltage data to a rolling buffer. This 
voltage data is then transformed by channelization and 
squaring into a matrix of incoherent power measure- 
ments at discrete frequencies and timesteps. A tran- 
sient detection system analyzes the n independent data 
streams, searches for probable events, and triggers occa- 
sional transfers from the buffer into a permanent archive 
whenever it discovers a likely candidate. Note that while 
the term "detection" is often used to describe the initial 
squaring operation, our use of the term always refers to 
the final promote/discard decision. 

This paper describes several detection algorithms and 
characterizes their performance with respect to both de- 
tection sensitivity and resilience to RFI. It then presents 
experimental trials using the softwar e correlator of the 
Very Long Basehne Array (VLBA) ()Deller et all 120071 
120101 ). We describe V-FASTR, a commensal real-time 
detection system developed as a pr ecursor to the Aus- 
tralian SKA Pathfinder Project (W avth et alll2011| ). V- 
FASTR incorporates multiple stations and adapts to dy- 
namic antenna configurations and RFI conditions. Its 
observations of pulsar B0329-f 54 demonstrate sensitiv- 
ity improvements from on line adaptivity and multiple- 
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TABLE 1 

Taxonomy of incohere nt transient detec tion methods 

BASED On IBhAT ET AL.I (120051) 

station synthesis. This suggests that multiple-station al- 
gorithms might improve performance of commensal tran- 
sient searches by other geographically distributed instru- 
ments, such as the future Square Kilometre Array. 

2. DETECTION METHODS 

iBhat fc CordesI (|2005[ ) classify transient detection 
strategies according to the number of independent beams 
and stations involved; multiple-pixel detection uses sev- 
eral fields of view while multiple- station detection uses 
several geographic locations that observe a common tar- 
get. Table [T] expands their taxonomy with recent and 
anticipated transient detection projects. This work deals 
with multiple-station transient detection, applicable to 
VLBI instruments as well as future installations such as 
ASKAP, MeerKAT, and the SKA. The VLBA transient 
project is the first detection system excising RFI with 
more than two separate locations. The following sec- 
tions present several basic flavors of multiple-station al- 
gorithms, first establishing notation for the single-station 
case and then extending this framework to incorporate 
many geographic locations. 

2.1. Single-station detection 

We consider the output of a single station to be a func- 
tion of both time and frequency, S{t,iy). Here, S could 
be either the voltage or, anticipating discussion below, 
the autocorrelation of the voltage. More generally, the 
output could have other dependencies such as polariza- 
tion. We consider a function / that operates on a subset 
of S, for instance, a short segment of the data in time or 
a restricted frequency range. Formally we can represent 
each segment as a vector x g A". Here x is a multivariate 
data point comprised of the frequency and time values in 
a single segment. We seek a single- valued discriminant 
function /(x) from which transients can be detected in 
that /(x) is large if there is an astronomical transient 
present and /(x) is small if the data contain only noise 
or RFI. Promising segments whose values exceed a user- 
defined threshold r are promoted to archival storage and 
coherent analysis. 

Transient signals are distorted by dispersion from their 
passage through the interstellar medium. This mani- 
fests as a frequency-dependent time delay that is in- 
versely proportional to t he sig nal's frequency. Following 
iLvne fc Graham-SmithI ()199S 
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Fig. 1. — Multi-station architecture for transient detection. An incoherent dedispersion search produces a separate time series for each 
station and candidate DM. The detector stage analyzes the streams from all stations and makes real-time decisions about which time 
segments to promote. 



where Aj/ is the observation bandwidth and DM is the 
Dispersion Measure of the source, a quantity represent- 
ing the integrated free electron density along the line 
of sight. A broadband pulse experiences a different de- 
lay at each frequency and is thus distorted into a time- 
swept curve. Detecting broadband pulses requires a filter 
that matches this dispersion sweep in the time/frequency 
domain. Equivalently, one can correct the delay inde- 
pendently in each frequency with dedispersion, and sub- 
sequently apply a simultaneous matched filter o ver all 
channels (|Bhattacharvalll998l : lOeneva et al.ll2009f ). Typ- 
ically the DM is not known in advance, but searching 
over a set V of DM values provides sensitivity to many 
possible dispersion profiles. Typical observed DMs range 
from for terrestrial events, up to order 10'^ for local 
sources, to order 10'^ for sources near the galactic center 
where the interstellar medium is dense. Negative disper- 
sion measures do not correspond to any anticipated nat- 
ural phenomenon, but they may also be tested for false 
detection statistics and relevance for Extra Terrestrial In- 
telligence (ETI) investigations. The optimal DM search 
spacing is related to the frequency range, filterbank chan- 
nel w idth and time resolution (jCordes fc McLaughlinl 
[200l . DMs can be searched in parallel so the trans- 
formation is amenable to multi-core software solutions. 
Other methods for re al-time dedispersion in clude GPU 
or FPGA processing f von Korff et al .''200^ o r effic ient 
caching structures such as Taylor trees (T avloil[l974[ ). 

We use the operator 0(x, d) to signify a matched filter 
shaped for dispersion to a specific DM d. The detection 
decision can be independent for each dedispersed seg- 
ment, leading naturally to the classical maximum likeli- 
hood discriminant function: 



promote if /(x) > r 



for /(x) 



max( 
del? 



(x,d) 



(2) 



Disregarding interference, in the ideal case both sky 
and instrument noise in the time domain are gaussian- 
distributed. After squaring and integration the summed 
samples follow a Chi-squared distribution Xm with many 
degrees of freedom, which we can approximate by an- 
other gaussian. This leads to the following expression 



for the min imum detectable intrinsic peak flux density 
(|Bhattacharva 1998): 



Sri 



(3) 



Here Sgys is the system-equivalent flux density, Npoi is 
the number of polarizations, Aiy is the total bandwidth 
of the filter, and At is the intrinsic pulse duration. The 
term (^) is simply the SNR detection cutoff (typically 
5), with a the standard deviation of /(x) for noise. One 
can easily calculate sensitivities for a known false alarm 
rate and receiver sensitivity. 

This sensitivity estimate is invalid when additive in- 
terference makes the detector output non-gaussian. We 
model occasional impulsive RFI by a distribution Qo , and 
a real transient source by Gi. For the hypothesis Hq that 
the segment is noise, and Hi that the event is a transient, 
we have: 



x|Ho ^ xli + Go and x|i?i xl 



(4) 



Most segments do not contain RFI so Go has a large prob- 
ability mass at 0. However, even occasional interference 
can quickly dominate detections making it a major prac- 
tical impediment to the effective survey yield. Therefore, 
in addition to quantifying a detector's flux sensitivity it 
is also useful to examine its power as a classifier, i.e. its 
ability to discriminate between true events and interfer- 
ence. 

We measure classiflcation performance with a quantity 
from decision theory known as the expected loss i?[>C]- 
This incorporates the cost Cpp of any false positive de- 
tections and the cost Cfn oi all false negatives: 

E[C] = I Cfp p(/(x) > r) p(x|i7o)p(Ho)dx + 

[ Cfn p(/(x) < r) p{^\Hi)p{Hi)d^ (5) 

We collapse unknown but static terms into constants ai: 

S[£]cxao p(/(x) >T|i/o)+«i p(/(x) < r|i/i) 
(xa p(/(x) > r|i7o) +p(/(x) < rliJi) 
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This reduces under a monotonic transform to a weighted 
sum of the false positive rate and true positive rate. 

E[C] - a p(/(x) >t\Ho)- p(/(x) > t\H,) (6) 

A joint detection and excision rule is simply a discrimi- 
nant function /(x) that aims to optimize this objective. 

One can compare classification performance under dif- 
ferent cost assumptions using a Receiver Operating Char- 
acteristic (ROC) curve such as Figured The ROC plot 
shows all possible true and false positive rates for differ- 
ent choices of r. For a given possible false alarm tolerance 
(i.e., a particular a value in equation [6|), the best perfor- 
mance achievable for the discriminant lies somewhere on 
the ROC line. A random detection rule, which provides 
no information about the RFI/transient distinction, cor- 
responds to a diagonal line with equal false positive and 
true positive rates. Better detection results approach the 
upper left region. 

This example shows a simulation where a hypothet- 
ical time series is dedispersed to the correct DM for 
each of 32 independent frequency channels so that no 
residual time delay remains. Matched filtering yields a 
background noise signal. We synthesize a dataset of 
10000 timesteps with additive RFI and additive transient 
pulses in equal proportion. Both kinds of events have a 
single timestep width, and a constant SNR is used so 
that the only random element is measurement noise. If 
RFI pulses are weaker on average than transients, then 
the detection rule favors true transient events and the 
curve approaches the upper left. 

The area under the ROC curve is commonly used as 
a figure of merit to summarize the ROC performance. 
However, for the transient detection task, very few can- 
didates can be promoted so the majority of this area is 
not relevant. Instead, the most important aspect of per- 
formance is the ROC in the regime of extremely low false 
positive rates. For our observational study that follows 
we will consider the area under the curve for a false pos- 
itive rate of < 0.01, corresponding to a detection rule 
that promotes 1% of all time segments. ROC analysis, 
together with sensitivity under perfect observing condi- 
tions (e.g. equation Is]) , permits principled comparisons 
of single- and multiple-station detection strategies. 

2.2. Multiple- station detection algorithms 

In the general multiple-station case with n geographic 
locations signals are dedispersed and filtered indepen- 
dently for each station and DM. This gives a combined 
data set of size |2?| x n at every timestep. Geographic sep- 
aration can assist with detection because transients are 
correlated across multiple stations, while (local) RFI is 
not. Here we present several alternative families of multi- 
station detection algorithms. They vary in computa- 
tional complexity, accuracy, and ease of implementation. 
For simplicity we assume that all receivers have similar 
sensitivity, though multiple-station methods could also 
benefit more diverse systems such as LOFAR and the 
SKA that have centralized concentrations of collecting 
area. 

2.2.1. Sum of Signals 

For incoherent detection with a matched filter, one 
achieves maximum sensitivity by sum ming the dedis- 
persed signal over all stations as iu iHessels et al.l (|2009D . 



ROC performance (RFI discrimination for 5a pulses) 




False positive rate 

Fig. 2. — ROC curve showing classification performance on a 
simulated dataset, for single-station detection under different RFI 
environments. 

The discriminant function is the maximum of all such 
sums at each DM: 

1 " 

fsum (x) = max - (/)(Xa , d) (7) 

dev n ^-^ 

a=l 

For n stations this rule provides -v/n improvements in 
detection sensitivity. 

Ssum = ( ~] — , , ^ y . (8) 

\<jJ ^NpoiAiyAt n 

Figure [3] portrays this detection strategy. It shows a 
simplified case with RFI and transient signals for just 
two stations. The axes show stations' matched fil- 
ter responses in arbitrary units after dedispersion to 
the appropriate DM. Scattered points show noise, RFI- 
contaminated, and true transient segments drawn from 
the basic model. We use unrealistic proportions of event 
types in order to show a significant number of each. RFI 
appears in just one station at a time, so these points lie 
close to the axes. Transient events show large signals at 
all stations. 

The summation rule corresponds to a discriminant 
function with hyperplane isocontours. Any specific 
choice of detection threshold r defines one isocontour as 
the decision boundary separating detected and rejected 
data. Figure |3] shows a typical decision boundary with 
the line labeled Sum. This illustration demonstrates why 
summation can never capture all transient events with- 
out also including some interference. Geographically sep- 
arated stations could actually magnify this effect, since 
each additional geographic location brings a new RFI 
environment. 

A simple simulation demonstrates the summation 
rule's performance with different numbers of stations. 
Transients appear with equal magnitude at all stations, 
such that each stations' signal taken individually has 
SNR ranging randomly and uniformly from 1 to 2. RFI 
events have SNR ranging from 5 to 10, with all power 
concentrated at one random station. This scenario dis- 
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Fig. 3. — "Decision boundaries" of multiple-station detection 
methods, illustrated in the simplest case of two stations. The axes 
correspond to the signal strength at each station. RFI yields a 
strong response at one location, while actual transient signals ap- 
pear at multiple stations simultaneously. 
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Fig. 4. — ROC curve showing false positive and true positive 
rates for discriminating weak transients from impulsive RFI. The 
simulation uses the basic summation rule as a discriminant. The 
data set consists of 30,000 timesteps containing an equal number 
of transients (SNR 1-2) and RFI events (SNR 5-10). 

counts several kinds of interference that are more difficult 
to model, such as periodic or switched-mode interference 
patterns. We simulate 30000 timesteps with equal por- 
tions of transient and regular events. The ROC curves in 
FigurelHshow detection performance as an RFI/transient 
classifier. Not surprisingly, the classification power de- 
pends purely on the total signal so it must accumulate 
several stations before it outperforms random selection. 
A coherent detection system summing voltage values in- 
stead of accumulated powers could have different perfor- 
mance characteristics. 

2.2.2. RFI Masks 




Fig. 5. — ROC curve for the RFI mask method. 



IBhat et aLl (j2005[ ) demonstrate a dual-station algo- 
rithm for RFI excision during joint observations at 
Arecibo and the Green Bank Telescope. They threshold 
the signals at each station independently and compare 
the resulting event lists to yield an RFI mask. We gener- 
alize this approach to the many-receiver case by masking 
all events not detected in at least two stations: 



fn 



s/c(x) = max ■ 







max( 



if(j(x,d) < 1 
, d) if g(x, d) > 1 



(9) 



The number of stations' signals exceeding r is given by: 



g(x, d) 



(xa,d)>T„, l<a<n| (10) 



The stations might have dissimilar RFI environments, 
prescribing a different threshold for each. This mask- 
ing operation provides near-perfect RFI excision. How- 
ever, any detection of a real transient must occur in- 
dependently in each receiver. Therefore the minimum 
flux sensitivity is identical to the single-station case. We 
have: 



Jmask I 



s. 



sys 



(11) 



Figure [5] shows ROC performance as an RFI/transient 
classifier for varying numbers of stations. Additional sta- 
tions do not improve performance; the overall detection 
sensitivity of the system is always equal to the second 
most sensitive station. 

2.2.3. Robust Sum 

Signals from geographically distributed stations are 
tantamount to Independent and Identically Distributed 
(IID) samples from a common univariate process. 
From this perspective RFI events are outliers that 
can be mitigated with robust estimation strategies. 
Examples include trim med and Winsorized estimators 
(|Leonowicz et al.l 120051) . The two-tailed trimmed esti- 
mator removes the strongest and weakest k stations to 
produce the following discriminant function. With sta- 
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Fig. 6. — ROC curve for the robust estimator. 

tions ordered by signal strength, 
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dev n — 2k 



i=fe+i 



(Xa,d) (12) 



The two-tailed trimmed estimator requires at least three 
stations, but the one-tailed version (which simply excises 
the strongest signal) gives a meaningful result for just two 
stations. They produce axis-parallel decision boundaries 
like the line labeled Robust in Figure |31 

Trimming stations incurs a sensitivity penalty. One 
can characterize sensitivity using the empirical standard 
deviation drobusu the corresponding gaussian gives the 
absolute signal strength at a given rejection threshold. 
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Figure [6] shows the robust estimator's ROC perfor- 
mance for varying numbers of stations. In the three- 
station case performance is equal to the RFI masking 
approach since both methods use the second-strongest 
signal. For four or more stations the robust estimator 
sums multiple measurements, increasing flux sensitivity 
and improving performance. 

The trimmed decision rule requires signals to be sorted 
for each timestep and dispersion measure. This operation 
is computationally tractable for existing very long base- 
line configurations. A separate issue is the choice of the 
number of stations k to trim at each timestep. The best 
k minimizes the expected loss from Equation|6l which de- 
pends on data quality and the likelihood of simultaneous 
RFI events in multiple stations. Ideally most timesteps 
exhibit "clean" gaussian noise, with RFI appearing only 
occasionally and in one station at a time. In this case 
fc = 1 removes the RFI while maintaining maximum sen- 
sitivity to weak pulses. If interference is not perfectly 
independent or if a receiver suffers from persistent noise 
conditions then simultaneous interference events could 
occur. These situations would benefit from setting fc > 1. 
If noise conditions change, one can find the current best 
setting for k on-line by injecting synthetic pulses into the 
data stream and then attempting to detect them using 



several trimming values. 

2.2.4. Ensemble Estimate of CDF 

The Ensemble CDF (ECDF) rule mitigates RFI with 
a monotonic transformation of signal strengths that re- 
duces the influence of extreme values from any single 
station. Specifically, it estimates the probability that an 
observed signal exceeds a random typical timestep whose 
magnitude is the random variable X: 

/ecd/(x) =maxF(0(x, d)) 

= niaxp{(f>{X, d) < 4>{'x.,d)) 

d^V 

1 " 

= max- Vp((?!)(Xa,d) < </>(xa,d)) (14) 
dev n ^ — ' 

a=l 

Each station maintains a separate probability estimate 
p, so strong signals have less influence at stations with 
chronic RFI. The expectation of this probability over all 
stations constitutes an ensemble estimate of the Condi- 
tional Density Function (CDF). The method is reminis- 
cent of the mean rule for c ombining mu l tiple o ne-class 
classifiers, first suggested bv lTax fc DuinI ()2001[ ). It dif- 
fers in that we are concerned only with high-intensity 
signals so we use the CDF in place of the standard den- 
sity function. 

One can compute the probability estimates p using any 
statistical density estimator, in advance from historical 
data or online from the current time series. One practical 
on-line approach is to maintain an ordered list of recent 
signal strengths. A binary search finds the percentile 
rank of a new obs ervation, wh ich provides an empirical 
CDF value (Wass ermanI 120061 ). Updating the ordered 
list requires an 0(log n) insertion operation. If constant- 
time computation is desired, one can discretize the CDF 
into k unique values and store just fc-tile signal strengths. 
One can also reduce the computational burden by using 
a single probability estimate for multiple DMs. 

It can be shown that the ensemble estimator retains 
optimal flux sensitivity for detecting weak sources. The 
discriminant rule is a sample average of a CDF which 
is concave wherever values are larger than the average 
noise. Jensen's rule can be used to show that sensitiv- 
ity is preserved in this region of interest. A demonstra- 
tion appears in the Appendix. In brief, the CDF of the 
on-source mean remains constant in expectation, while 
the CDF of off-source RMS preserves i/n improvements 
in noise variance. Absent practical concerns about dis- 
cretization or accuracy in extreme tail regions, the rule 
maintains sensitivity to the weakest signals. 
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Figure [7] shows ROC performance for the ensemble 
CDF function with varying numbers of stations. It un- 
derperforms with just a few stations, but soon overtakes 
the robust estimator as the number of stations increases 
past four. A potential advantage of the ensemble ap- 
proach is that computing an independent probability 
score for each station compensates automatically for any 
systemic differences in their background signals or noise 
environments. Finally, Figure [3] shows a typical decision 
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Fig. 7. — ROC curve for ensemble CDF discriminant. 

boundary. Transforming all signals to the [0, 1] interval 
down-weights the most extreme signal values and im- 
proves RFI rejection. 

2.2.5. Quadratic Discriminant 

With examples of both transient and non-transient 
phenomena the detection problem becomes a traditional 
supervised classification task to find an optimal decision 
boundary separating two labeled data classes. Typical 
machine learning solutions include Neural Networks or 
Support Vector Machines (jBishopI 120061 ). Here we ex- 
plore a simple quadratic decision boundary, which is opti- 
mal if on-source and off-source distributions can be char- 
acterized by multivariate gaussian PDFs. The discrim- 
inant function is defined by a mean ^ and a positive 
definite covariance matrix S"^. 



fquad (^) 



max((/)(X, d) — /i) 



(0(X,d)-/i)(16) 



This supervised method can potentially achieve superior 
performance due to strong assumptions about the statis- 
tical properties of the source. It accounts explicitly for 
sources' strengths and optimizes its decision boundary to 
the precise level of noise and degree of correlation across 
stations. 

If its training assumptions are satisfied the sensitiv- 
ity of the quadratic discriminant function is at least as 
good as the standard summation. For example, in the 
RFI-free case, both noise and pulse distributions are 
truly gaussian with equivalent diagonal covariance ma- 
trices — only the means differ. Here the optimal de- 
cision boundary separating the two classes is a hyper- 
plane. More generally, the quadratic discriminant is 
optimally-sensitive as long as the data satisfies its as- 
sumption of gaussian-distributed classes. In these cases 
a non-diagonal quadratic form is the proper likelihood 
ratio. Figure [8] shows ROC performance. 



"^quad ' 



(9 



Ss 



, = (17) 

Table [2] summarizes the Area Under the ROC Curve 
(AUC) score for each of the discriminant functions and 
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Fig. 8. — ROC curve for quadratic discriminant. Supervised 
methods can provide superior performance if the training distribu- 
tion is similar to the true data. 
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n=3 


4 


5 
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7 


8 


9 


10 


Sum 


.224 


.367 


.500 


.610 


.702 


.756 


.815 


.856 


Mask 


.676 


.669 


.679 


.688 


.692 


.691 


.700 


.704 


Robust 


.676 


.755 


.818 


.864 


.899 


.917 


.937 


.951 


ECDF 


.580 


.746 


.833 


.885 


.923 


.938 


.955 


.967 


Quad 


.985 


.985 


.988 


.989 


.991 


.993 


.994 


.995 



TABLE 2 

Area Under the ROC Curve (AUC) score for each method, 

FOR various numbers OF STATIONS IN SIMULATED TRIALS. 

numbers of stations. By this measure, the quadratic dis- 
criminant consistently outperforms all alternatives; its 
ability to discriminate impulsive RFI using just three sta- 
tions is superior to the best performance offered by the 
next-best rule, for ten stations. Its discrimination per- 
formance would be worse for new RFI environments or 
transients that did not match the training distributions. 
In general it is best to estimate any detection method's 
free parameters on-line using the most current data. 

Figure O shows sensitivity to weak pulses for each 
method in units of flux intensity relative to a single sta- 
tion. The robust method's sensitivity is difficult to de- 
scribe analytically so we estimate it from the on-source 
mean and off-source RMS using 10000 timesteps of simu- 
lated data. Its weak signal sensitivity is nearly indistin- 
guishable from the classical summation rule for config- 
urations with five or more stations. This suggests that 
some form of robust estimation is almost always bene- 
ficial. A conservative decision to excise just one or two 
stations from the sum causes the smallest marginal sensi- 
tivity impact but produces the largest marginal improve- 
ments in interference excision. 

Monte Carlo simulations can determine the sensitiv- 
ity of more complex classifiers; one can fit a mapping 
from the detection probability onto signal strength using 
a function approximator like a smoothing spline. We use 
this approach for the quadratic discriminant in Figure [S] 
and verify that sensitivity is indeed equivalent to the op- 
timal summation rule. Most methods in the diagram 
have similar sensitivity because it describes the RFI- 
free incoherent detection limit for perfect Gaussian noise. 
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Fig. 9. — Sensitivity for each method relative to single-station 
detection, in terms of noise RMS (lowrer is better) as the number 
of stations increases. 

Here a ^yn improvement is the best one can achieve. 

3. VLBA OBSERVATIONS 

This section describes a case study of muhiple-station 
transient detection using the Very Long Basehne Ar- 
ray (VLBA). The experiment is part of the V-FASTR 
project, a trailblazer for the Austrahan Square Kilome- 
tre Arr ay Pathfinder's CRAF T fast transients investi- 
gation (jMacauart et aL|[2010l ). V-FASTR has installed 
a transient detection pipeline for commensal operation 
alongside standard VLBA observations. A complete de- 
scription of the architecture a nd initial results ar e pro- 
vided in a companion paper (|Wavth et al.l[2?)Tl . For 
completeness we will also provide a brief overview here. 

The VLBA has 10 geographically dispersed stations, 
each providing a single 25m antenna. These antennas 
are distributed across North America. The longest base- 
line stretches from Mauna Kea, Hawaii to Hancock, New 
Hampshire but the highest concentration of stations is 
in the Southwestern United States. No two stations are 
with each other's local horizon, and anywhere from 2 to 
10 stations may participate in an observation. Voltage 
recordings are saved to disks and shipped to a central fa- 
cility in Socorro, NM, where a c omputing cluster runnin; 
the DiFX software correlator (jPeller et all 120071 1201 
processes signals for imaging and post-analysis. 

DiFX has been reconfigured to calculate auto- 
power spectra for each antenna, producing integrated 
frequency-channelized power measurements every 1 ms. 
An incoherent software dedispersion algorithm processes 
each station independently as in the architecture diagram 
of Figure [T] This stage uses three commercial multicore 
processors in parallel, and easily processes hundreds of 
dispersion measures in real-time. Our tests to date have 
used 190 distinct dispersion measures while consuming 
just 10-20% of the total system capacity. After dedisper- 
sion a transient detection stage processes the resulting 
time series and saves a small portion of the raw voltage 
data. Online processing is essential since any archiving 
decisions must be made before the correlation job finishes 
and the entire disk is erased for reuse. 
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Fig. 10. — A typical segment taken from the third scan of the 
B0329-I-54 observation. The actual pulses from the pulsar are in- 
dicated by arrows at the top of the diagram. 

3.1. Method 

The pulsar B0329-I-54 was observed at four 8MHz-wide 
bands evenly spaced from 1.4GHz to 1.674GHz; each 
band was channelized into 0.25MHz frequency resolution 
and the resultant power spectra accumulated for 1ms. 
This frequency resolution is typical for VLBA observa- 
tions, and we use it here for fidelity to a commensal sys- 
tem. After dedispersion to its known DM of 26.8pc/cm'^, 
the pulsar has an intrinsic pulse width of approximately 
10ms and is easy to resolve at this time resolution. The 
pulsar period is approximately 714ms; after dedispersion, 
typical data appears like the segment shown in Figure fTUl 
This segment shows diverse interference including impul- 
sive noise and systemic changes in the background at in- 
dividual antennas. Such interference would probably not 
significantly impact the cross-correlated measurements 
for which the VLBA system was originally designed, but 
it is problematic for finding short-duration events in the 
high-resolution time series data. 

The pulsar was observed simultaneously at 9 stations 
over 6 contiguous observation segments, or scans, with 
durations of 242s. Each scan was interspersed with mea- 
surements of a calibrator source; scans were spaced at 
approximately 5 minute intervals. The observed pulse 
strength changed during the observation sequence, with 
signals weakening progressively in later scans. Several 
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antennas' average signal strengths also drifted slightly. 
We compensated with a conservative high pass filter, 
subtracting the 100 timestep moving average from each 
observation. 

We assembled an authoritative event list by exploit- 
ing pulsar periodicity. We posit that the actual events' 
arrangement has just two free parameters: start time 
and period. We initialize these parameters by identifying 
the strongest pulses with visual inspection, and then ex- 
trapolate the other pulses' locations from the periodicity 
given by the catalog. A precise optimization of start time 
and period then centered the events in the pulses by max- 
imizing the mean signal over all events. The typical SNR 
was 16 after summing all antennas. We created a test 
data set using positive examples drawn from the center 
of each pulse, and negative examples drawn at intervals 
between pulses. We spaced the negative examples regu- 
larly at 10% of the pulsar period, which provided a large 
sample but left enough separation between pulses to pre- 
serve statistical independence and insulate the negative 
samples from finite-width pulses. We labeled timesteps 
as positive if they occurred at the right time according 
to the pulsar's rotational ephemeris, regardless of the 
actual received SNR. This created a more difficult clas- 
sification challenge and characterized discriminant algo- 
rithms' performance across many pulse strengths. Each 
scan contained approximately 380 pulse events and 3800 
negative examples. 

We reserved the initial scan for training. This train- 
ing scan was effectively scan 0, and we will omit it from 
performance reports. We used the remaining scans (1 
through 5) to evaluate the detection algorithms. We also 
computed algorithms' performance for all five test scans 
combined. We set all free parameters through optimiza- 
tion on the data from the training scan, using the value 
fc = 2 for the robust estimator. The Ensemble CDF algo- 
rithm did not require prior training; instead we estimated 
the CDF using the data from eac h scan in progress u sing 
a nonparametric plug in method (jWasserman 120061) . 

3.2. Results 

Figure [11] shows the distribution of received power for 
pulse and non-pulse segments, grouped by scan and sta- 
tion, and illustrated by top and bottom boxes respec- 
tively. The boxes indicate inter-quartile ranges and me- 
dians, with notches marking the 95% confidence inter- 
vals. Pulse power varies across scans, but these varia- 
tions seem correlated across stations. This suggests that 
the received flux actually change d which favor s scintilla- 
tion as a promising explanation ()Rickettlll990[ ). Reports 
of scintillation are common in previous studies of pul- 
sar B0329-t-54. The cross-scan variability observed here 
is consistent with the 20 minute diffract ive scintillation 
cycle observed bv iSemenkov et aH (|2004[ ). 

Figured!] and Table [3] show the Area Under the ROC 
Curve scores for each method. Figure [T^] also compares 
total on- and off-source power, plotted with box and 
whisker diagrams in the upper panel. The differences 
in signal strength visibly affect performance. The ro- 
bust and ECDF discriminants perform best overall due to 
their ability to discriminate weaker pulses in later scans. 
The quadratic discriminant initially performs quite well 
since the first scan falls directly after its training exam- 
ple when the characteristics of the test data are most 




Scan 



Fig. 11. — Spread of on-source signals (upper distributions) and 
ofT-source signals (bottom distributions) for each of nine anten- 
nas, over five scans. The box plots show interquartile ranges, with 
notches marking the 90% confidence intervals for the median. The 
distributions are better separated during the initial scans. 
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Sum 


0.826 


0.747 


0.446 


0.087 


0.279 


Mask 


0.765 


0.679 


0.312 


0.180 


0.295 


Robust 


0.823 


0.824 


0.612 


0.388 


0.446 


ECDF 


0.900 


0.737 


0.625 


0.406 


0.465 


Quadratic 


0.913 


0.674 


0.370 


0.427 


0.345 



TABLE 3 

Area Under the ROC Curve (AUG) score of pulsar 
B0329+54 observations, for realistic false positive rates 
(less than 0.01). 



Method 


Scan 1 


2 


3 


4 


5 


Sum 


0.864 


0.900 


0.877 


0.826 


0.821 


Mask 


0.969 


0.921 


0.827 


0.847 


0.840 


Robust 


0.814 


0.920 


0.866 


0.899 


0.881 


ECDF 


0.807 


0.936 


0.850 


0.879 


0.895 


Quadratic 


0.846 


0.834 


0.904 


0.892 


0.845 



TABLE 4 

Total Area Under the ROC Curve (AUC) score of pulsar 

B0329 + 54 OBSERVATIONS, FOR ALL FALSE POSITIVE RATES. 

similar. However, this method's performance degrades 
severely on later scans where the source is weaker rela- 
tive to RFI. 

We consider the ROC curve in the regime of low tol- 
erances for false positives, and specifically operations- 
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Fig. 12. — Distributions of signal (upper distributions) and off 
source segments (bottom distributions) for each scan, with boxes 
corresponding to interquartile ranges and whiskers the extent of 
the data sans extreme outliers. The lowerpanel shows normalized 
Areas Under the ROC Curve from Table 

relevant trigger rates that archive no more than 1% of 
all candidates (although for completeness, we also report 
the total AUG scores in Table In practice every de- 
tection must promote an interval of time around each 
detection in order to capture the entire dispersed pulse 
and provide context to characterize RFI. Figure [T3l plots 
the actual ROC curves of each method up to a 0.01 pro- 
motion rate. We form c onfidence intervals for the ROC 
curve with a bootstrap (jBertail et al.ll2008l) . Specifically 
we draw randomized resamplings of the original dataset, 
recompute classifications an d from this t he IIOC using 
a kernel-smoothed estimator IWassermanI (|2006f ) of true 
and false positive rates. Finally, we identify the median 
ROC curves and 90% bounding coverage intervals using 
the bootstrap sample. 

The experiment reveals a highly significant differ- 
ence between single- and multiple-station approaches. 
Multiple-station methods, such the ECDF and robust 
discriminants, promote more pulses than noise events for 
similar time budgets. A realistic budget would permit 
just a few false positives. The steep initial slope of multi- 
station ROC curves implies superior performance in this 
regime. 

Improved ROC performance permits more lenient de- 
tection thresholds and improved sensitivity. Figure [U 
shows the SNR of pulses that can be captured by each 
method for different false positive tolerances. The SNR 
associated with an empirical 90% probability of capture 
is shown, based on the combined dataset from all scans. 
We determine 95% confidence intervals with bootstrap 
sampling. Note that there exists a threshold where top 
performer (the Robust estimator) captures all pulses of 
SNR > 25 with 90% probability, without promoting a 
single non-pulse to archival storage. The standard sum- 
mation approach promotes 40 RFI events before achiev- 
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Fig. 13. — Detection performance for the B0329-I-54 observation, 
over all scans. We focus on ROC curve in the relevant region of 
false positive rates significantly less than 0.01. 
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Fig. 14. — Sensitivity of discriminant functions: the SNR of 
pulses that can be captured with 90% certainty, for various false 
positive budgets, over all scans. 

ing this effective sensitivity. 

4. CONCLUSIONS 

This work demonstrates a case study of real-time in- 
coherent detection of transient signals from multiple sta- 
tions. Preliminary tests with the VLBA corroborate our 
theoretical analysis that uses impulsive noise and homo- 
geneous receivers. These tests constitute a case study 
where multiple-station algorithms yields significant per- 
formance benefits over a standard summation approach. 
When the tolerance for false positives is low, which is 
the case for most practical installations, multiple-station 
methods can achieve significant sensitivity improvements 
without increasing false alarms. Coupling these tech- 
niques with other statistical or multi-band approaches 
for RFI excision might improve performance further. For 
example, alternative RFI mitigation might still be impor- 
tant to excise satellite signals observed simultaneously by 
multiple stations. 

One unexpected result from the VLBA experiment was 
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that the supervised learning approach (the quadratic de- 
cision function) performed worse in practice than our 
original model predicted. This discriminant relies on 
prior examples of both pulses and RFI, so the observed 
drift in pulse intensity over time invalidates its training 
assumptions. Alternatively, stricter regularization could 
be used to prevent over-fitting of the training data. Fu- 
ture research may also consider more sophisticated learn- 
ing algorithms with better generalization properties, such 
as those that can detect changes in the underlying phe- 
nomena (concept drift). 
There are other promising avenues for improvement. 



Machine learning techniques can interpret information 
from features beyond the simple signal measurements 
used in our tests. Discriminant functions could incorpo- 
rate multiple matched filters and dispersion measures. A 
natural addition would be to consider the received signal 
at DM 0, which is a strong indicator of RFI. Attempts to 
expand the feature set should ensure that the resulting 
discriminant function retain a simple structure to avoid 
over-fitting to a single training environment and to keep 
computational requirements tractable for real-time pro- 
cessing. 



APPENDIX 

SENSITIVITY OF ENSEMBLE ESTIMATORS 

Here we provide a simple proof sketch that a broad class of multiple-station ensemble detection rules preserves 
sensitivity. We consider a discriminant function q that transforms the signal 4>i{x,d) according to some positive, 
monotonically increasing, and concave function r at each station independently, and then averages the result across 
stations. 



1 " 



(1) 



The ensemble CDF estimator of section [2] falls into this category insofar as the noise CDF is positive and concave in 
the region of interest (i.e., larger-than-average values). We assume a gaussian noise distribution. The classical detector 
that averages all n stations yields the noise distribution A/'(/i, a/ ^/n). We aim to show that the ensemble estimator is 
no less sensitive. In other words, for some constant on-source signal strength r: 

(2) 



P {q{N{li, a)) < qir)) > P(AA(^, a/^) < r) 

/ n n \ 

\ a=l a=l / 



For the concave function r, Jensen's inequality provides (for some constant c): 



P 



\ o=l / a=l 

P{q{Af{fi,a)) <c)>p(r (iAf{^,,a)] < 



P{q{M{fi, a)) <c)> P{r{Af{pi, a/V^)) < c) 



Therefore, in order to show 



P{qiMi^i, a)) < q(r)) > P(AA(m, a/V^) < r) 
it is sufficient with transitivity to demonstrate: 

P(r(AA(/i,a/V^)) < q{T))>P{U{^l,a/V^) < r) 
T is constant so we can substitute to yield: 

P(r(AA(/i,a/V^)) < r(T))>P(A/-(/i,a/^A^) < r) 
If r is a positive monotonically increasing function, this is a tautology. 
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